Optimization of Percentage Cube Queries
نویسندگان
چکیده
OLAP cubes are a powerful database technology to join tables and aggregate data to discover interesting trends. However, OLAP cubes exhibit limitations to uncover fractional relationships on a measure aggregated at multiple granularity levels. One prominent example is the percentage, an intuitive probablistic metric, used in practically every analytic application. With such motivation in mind, we introduce the percentage cube, a generalized data cube that takes percentages as the target aggregated measure. Specifically, the percentage cube shows the fractional relationship on a measure in every cuboid between fact table rows grouped by a set of columns (detail individual groups) and their rolled-up aggregation by a subset of those grouping columns (total group). We inroduce minimal query syntax and we carefully study query optimization to compute percentage cubes. It turns out that percentage cubes are significantly more difficult to evaluate than standard data cubes because, in addition to the exponential number of cuboids, there exists a doubly exponential number of grouping column pairs (grouping columns at the individual level and at the total level) on which percentages are computed. Fortunately, it is feasible to prune the search space with a threshold similar to iceberg queries. Experiments on a DBMS compare our novel query optimizations against existing SQL OLAP window functions. Our benchmark results show that our proposed SQL extension is more abstract, more intuitive and faster than existing SQL functions to compute percentages on the cube.
منابع مشابه
Multiresolution Cube Estimators for Sensor Network Aggregate Queries
In this work we present in-network techniques to improve the efficiency of spatial aggregate queries. Such queries are very common in a sensornet setting, demanding more targeted techniques for their handling. Our approach constructs and maintains multi-resolution cube hierarchies inside the network, which can be constructed in a distributed fashion. In case of failures, recovery can also be pe...
متن کاملA Clustered Dwarf Structure to Speed Up Queries on Data Cubes
Dwarf is a highly compressed structure, which compresses the cube by eliminating the semantic redundancies while computing a data cube. Although it has high compression ratio, Dwarf is slower in querying and more difficult in updating due to its structure characteristics. We all know that the original intention of data cube is to speed up the query performance, so we propose two novel clusterin...
متن کاملConceptual Object Modeling for OLAP Cubes in a Data Warehousing Environment
Datacubes are efficient structures used to represent multidimensional aggregates at various levels. Quite often multiple datacubes are predefined and computed in order to assist analytical queries in Decision support systems. However being statically defined structures, they suffer from some inherent problems. Firstly, conventional datacubes are highly inefficient when created over sparse data....
متن کاملLes dépendances fonctionnelles pour la sélection de vues dans les cubes de données
OLAP query processing assumes two seemingly contradictory requirements: on one hand query processing should be fast (thus, queries pre-precomputation) and on another hand queries are assumed to be submitted in an ad hoc manner making workload usage for optimization sometimes not effective (we cannot materialize an infinite number of queries). Thus, in this paper we address the specific case of ...
متن کاملSummarizing Datacubes: Semantic and Syntactic Approaches
Datacubes are especially useful for answering efficiently queries on data warehouses. Nevertheless the amount of generated aggregated data is huge with respect to the initial data which is itself very large. Recent research work has addressed the issue of summarizing Datacubes in order to reduce their size. In this chapter, we present three different approaches. They propose structures which ma...
متن کامل